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Abstract 

In this paper we introduce the notion of approximate 
data structures, in which a small amount of error is 
tolerated in the output. Approximate data structures 
trade error of approximation for faster operation, lead- 
ing to theoretical and practical speedups for a wide va- 
riety of algorithms. We give approximate variants of the 
van Emde Boas data structure, which support the same 
dynamic operations as the standard van Emde Boas 
data structure |2^, ^|, except that answers to queries 
are approximate. The variants support all operations 
in constant time provided the error of approximation is 
l/polylog(n), and in O(loglogn) time provided the er- 
ror is 1 /polynomial(n) , for n elements in the data struc- 
ture. 

We consider the tolerance of prototypical algo- 
rithms to approximate data structures. We study in 
particular Prim's minimum spanning tree algorithm, Di- 
jkstra's single-source shortest paths algorithm, and an 
on-line variant of Graham's convex hull algorithm. To 
obtain output which approximates the desired output 
with the error of approximation tending to zero, Prim's 
algorithm requires only linear time, Dijkstra's algorithm 
requires O(mloglogn) time, and the on-line variant of 
Graham's algorithm requires constant amortized time 
per operation. 

1 Introduction 

The van Emde Boas data structure (veb) |||, ||(J 
represents an ordered multiset of integers. The data 
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structure supports query operations for the current 
minimum and maximum element, the predecessor and 
successor of a given element, and the element closest to 
a given number, as well as the operations of insertion 
and deletion. Each operation requires 0(log log U) time, 
where the elements are taken from a universe {0, U}. 

We give variants of the VEB data structure that are 
faster than the original veb, but only guarantee approx- 
imately correct answers. The notion of approximation 
is the following: the operations are guaranteed to be 
consistent with the behavior of the corresponding exact 
data structure that operates on the elements after they 
are mapped by a fixed function /. For the multiplica- 
tively approximate variant, the function / preserves the 
order of any two elements differing by at least a factor 
of some 1 + e. For the additively approximate variant, 
the function / preserves the order of any two elements 
differing additively by at least some A. 

Let the elements be taken from a universe [1, U] . On 
an arithmetic ram with 6-bit words, the times required 
per operation in our approximate data structures are as 
follows: 





multiplicative 
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Under the standard assumption that b — f2(logt/ + 
logn), where n is the measure of input size, the time 
required is as follows: 



e,A/U 


l/polylog(nf7) 


1/ cxp(polylog(n)) 


time 


0(1) 


0(log log n) 



The space requirements of our data structures are 
0(log(Z7)/e) and 0(17/ A), respectively. The space can 
be reduced to close to linear in the number of ele- 
ments by using dynamic hashing. Specifically, the space 
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needed is 0(\S\ + \f(S)\ ■ t), where S is the set of ele- 
ments, / is the fixed function mapping the elements 
of S (hence, 1/(5)1 is the number of distinct elements 
under the mapping), and t is the time required per op- 
eration. The overhead incurred by using dynamic hash- 
ing is constant per memory access with high probabil- 
ity H §)■ Thus, if the data structures are implemented 
to use nearly linear space, the times given per operation 
hold only with high probability. 

1.1 Description of the data structure. The ap- 
proach is simple to explain, and we illustrate it for the 
multiplicative variant with e = 1 and b — 1 + [log U\ . 
Let f(i) — [\og 2 i\ (the index of i's most significant bit). 
The mapping preserves the order of any two elements 
differing by more than a factor of two and effectively 
reduces the universe size to U' = 1 + [log U J . On an 
arithmetic ram with &-size words, a bit-vector for the 
mapped elements fits in a single word, so that succes- 
sor and predecessor queries can be computed with a few 
bitwise and arithmetic operations. The only additional 
structures are a linked list of the elements and a dictio- 
nary mapping bit indices to list elements. 

In general, each of the approximate problems with 
universe size U reduces to the exact problem with a 
smaller universe size U': For the case of multiplicative 
approximation we have size 

L/' = 21og 2 (C0/e = O(log 1+e U), 
and for the case of additive approximation 
U' = U/A . 

Each reduction is effectively reversible, yielding an 
equivalence between each approximate problem and the 
exact problem with a smaller universe. The equivalence 
holds generally for any numeric data type whose seman- 
tics depend only on the ordering of the elements. The 
equivalence has an alternate interpretation: each ap- 
proximate problem is equivalent to the exact problem on 
a machine with larger words. Thus, it precludes faster 
approximate variants that don't take advantage of fast 
operations on words. 

For universe sizes bigger than the number of bits 
in a word, we apply the recursive divide-and-conquer 
approach from the original veb data structure. Each 
operation on a universe of size V reduces to a single 
operation on a universe of size \fxF plus a few constant 
time operations. When the universe size is 6, only 
a small constant number of arithmetic and bitwise 
operations are required. This gives a running time 
of 0(log log b U'), where U' is the effective universe 
size after applying the universe reduction from the 
approximate to the exact problem. 



1.2 Outline. In the next section we motivate our 
development of approximate veb data structures by 
demonstrating how they can be used in three well- 
known algorithms: Prim's algorithm for minimum span- 
ning trees, Dijkstra's shortest paths algorithm, and an 
on-line version of the Graham scan for finding convex 
hulls. Related work is discussed in Section @. Our model 
of computation is defined in Section fl|. In Section [|, we 
show how to construct our approximate VEB data struc- 
tures and we analyze their characteristics. We make 
concluding remarks in Section ^. 

2 Applications 

We consider three prototypical applications: to min- 
imum spanning trees, to single-source shortest paths, 
and to semi-dynamic on-line convex hulls. Our approx- 
imate minimum spanning tree algorithm runs in lin- 
ear time and is arguably simpler and more practical 
than the two known linear-time MST algorithms. Our 
approximate single-source shortest paths algorithm is 
faster than any known algorithm on sparse graphs. Our 
on-line convex hull algorithm is also the fastest known 
in its class; previously known techniques require pre- 
processing and thus are not suitable for on-line or dy- 
namic problems. The first two applications are obtained 
by substituting our data structures into standard, well- 
known algorithms. The third is obtained by a straight- 
forward adaptation of an existing algorithm to the on- 
line case. These examples are considered mainly as pro- 
totypical applications. In general, approximate data 
structures can be used in place of any exact counter- 
part. 

Our results below assume a ram with a logarithmic 
word size as our model of computation, described in 
more detail in Section ^. The proofs are simple and are 
given in the full paper. 

2.1 Minimum spanning trees. For the minimum 
spanning tree problem, we show the following result 
about the performance of Prim's algorithm jl6| 25, 0| 



when our approximate veb data structure is used to 
implement the priority queue: 

Theorem 2.1. Given a graph with edge weights in 
{0,..,U}, Prim's algorithm, when implemented with 
our approximate VEB with multiplicative error (1 + e), 
finds a (1 + e)- approximate minimum spanning tree 
in an n-node, m-edge graph in 0{{n + m) log(l + 
(log -)/ log log nil)) time. 



For 1/e < polylog(n£7), Theorem |2.1| gives a linear- 
time algorithm. This algorithm is arguably simpler 
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and more practical than the two known linear-time 
MST algorithms. This application is a prototypical 
example for which the use of an approximate data 
structure is equivalent to slightly perturbing the input. 
Approximate data structures can be "plugged in" to 
such algorithms without modifying the algorithm. 

2.2 Shortest paths. For the single-source shortest 
paths problem, we get the following result by using an 
approximate VEB data structure as a priority queue in 
Dijkstra's algorithm (see, e.g., |27], Thm 7.6]): 

Theorem 2.2. Given a graph with edge weights in 
{0, ...,[/} and any < e < 2, Dijkstra's algorithm, 
when implemented with our approximate VEB with mul- 
tiplicative error (1 + e/(2n)), computes single-source 
shortest path distances within a factor of (1 + e) in 
0((n + m) log(log — / loglogt/)) time. 

If log(l/e) < polylog(n) log log t/, the algorithm 
runs in 0((n + m) log log n) time — faster than any 
known algorithm on sparse graphs — and is simpler 
than theoretically competitive algorithms. This is a 
prototypical example of an algorithm for which the 
error increases by the multiplicative factor at each step. 
If such an algorithm runs in polynomial time, then 
O (log log n) time per VEB operation can be obtained 
with insignificant net error. Again, this speed-up can be 
obtained with no adaptation of the original algorithm. 



Analysis. The proof of Theorem 2.2 follows the proof 
of the exact shortest paths algorithm (see, e.g., |27| , 
Thm 7.6]). The crux of the proof is an inductive claim, 
saying that any vertex w that becomes labeled during or 
after the scanning of a vertex v also satisfies dist(w) > 
dist(v), where dist(w) is a so-called tentative distance 
from the source to w. When using a (l + e)-approximate 
VEB data structure to implement the priority queue, the 
inductive claim is replaced by 

dist{w) > dist(v)/(l + e/(2n)) 1 , 

where vertex v is the ith vertex to be scanned. Thus, 
the accumulated multiplicative error is bounded by 

(l + e/(2n))"<e £ / 2 < (1 + e). 

We leave the details to the full paper, and only note that 
it is not difficult to devise an example where the error 
is actually accumulated exponentially at each iteration. 

2.3 On-line convex hull. Finally, we consider the 
semi-dynamic on-line convex hull problem. In this 
problem, a set of planar points is processed in sequence. 



After each point is processed, the convex hull of the 
points given so far must be computed. Queries of the 
form "is x in the current hull?" can also be given at any 
time. For the approximate version, the hull computed 
and the answers given must be consistent with a (1+A)- 
approximate hull, which is contained within the true 
convex hull such that the distance of any point on the 
true hull to the approximate hull is O(A) times the 
diameter. 

We show the following result about the Graham 
scan algorithm [ fl2| when run using our approximate 
VEB data structure: 

Theorem 2.3. The on-line (1+ A) -approximate convex 
hull can be computed by a Graham scan in constant 
amortized time per update if A > log _c n for any fixed 
c > 0, and in O(loglogn) amortized time per update if 
A > n- c . 

This represents the first constant-amortized-time- 
per-query approximation algorithm for the on-line prob- 
lem. This example demonstrates the usefulness of ap- 
proximate data structures for dynamic/on-line prob- 
lems. Related approximate sorting techniques require 
preprocessing, which precludes their use for on-line 
problems. 

Analysis. Graham's scan algorithm is based on scan- 
ning the points according to an order determined by 
their polar representation, relative to a point that is in 
the convex hull, and maintaining the convex hull via 
local corrections. We adapt Graham's scan to obtain 
our on-line algorithm, as sketched below. As an invari- 
ant, we have a set of points that are in the intermediate 
convex hull, stored in an approximate VEB according to 
their angular coordinates. The universe is [0, 2n] with a 
A additive error, which can be interpreted as the per- 
turbation error of points in their angular coordinate, 
without changing their values in the distance coordi- 
nates. This results in point displacements of at most 
(1 + A) times the diameter of the convex hull. 

Given a new point, its successor and predecessor 
in the VEB are found, and the operations required to 
check the convex hull and, if necessary, to correct it 
are carried on, as in Graham's algorithm |l2| . These 
operations may include the insertion of the new point 
into the VEB (if the point is on the convex hull) and the 
possible deletion of other points. Since each point can 
only be deleted once from the convex hull, the amortized 
number of VEB operations per point is constant. 

3 Related work 

Our work was inspired by and improves upon data 
structures developed for use in dynamic random variate 
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generation by Matias, Vitter, and Ni [119] . 

Approximation techniques such as rounding and 
bucketing have been widely used in algorithm design. 
This is the first work we know of that gives a general- 
purpose approximate data structure. 

Finite precision arithmetic. The sensitivity of 
algorithms to approximate data structures is related 
in spirit to the challenging problems that arise from 
various types of error in numeric computations. Such 
errors has been studied, for example, in the context of 
computational geometry |, |,p|, || f|J §|, |§. Wc 
discuss this further in Section Q 

Approximate sorting. Bern, Karloff, Raghavan, and 
Schieber || introduced approximate sorting and applied 
it to several geometric problems. Their results include 
an 0((n log log n)/e)-time algorithm that finds a (1 + e)- 
approximate Euclidean minimum spanning tree. They 
also gave an 0(n)-time algorithm that finds a (1 + A)- 
approximate convex hull for any A > 1/ polynomial. 

In a loose sense, approximate veb data structures 
generalize approximate sorting. The advantages of an 
approximate VEB are the following. An approximate 
VEB bounds the error for each element individually. 
Thus, an approximate VEB is applicable for problems 
such as the general minimum spanning tree problem, 
for which the answer depends on only a subset of the 
elements. The approximate sort of Bern et al. bounds 
the net error, which is not sufficient for such problems. 
More importantly, a VEB is dynamic, so is applicable 
to dynamic problems such as on-line convex hull and 
in algorithms such as Dijkstra's algorithm in which 
the elements to be ordered are not known in advance. 
Sorting requires precomputation, so is not applicable to 
such problems. 

Convex hull algorithms. There are several relevant 
works for the on-line convex hull problem. Shamos (see, 
e.g., |2(|) gave an on-line algorithm for (exact) convex 
hull that takes 0(log n) amortized time per update step. 
Preparata p4| gave a real-time on-line (exact) convex 
hull algorithm with 0(logn)-time worst-case time per 
update step. Bentley, Faust, and Preparata J2| give 
an 0(n + l/A)-time algorithm that finds a (1 + A)- 
approximate convex hull. Their result was superseded 
by the result of Bern et al. mentioned above. Janardan 
Jt5[ gave an algorithm maintaining a fully dynamic 
(1 + A)-approximate convex hull (allowing deletion of 
points) in 0(log(n)/A) time per request. Our on-line 
approximation algorithm is based on Graham's scan 
algorithm jl2| and can be viewed as a combination of 
the algorithms by Shamos and by Bentley et al., with 
the replacement of an exact veb data structure by an 
approximate variant. 



Computation with large words. Kirkpatrick and 
Reich [jl7| considered exact sorting with large words, 
giving upper and lower bounds. Their interest was 
theoretical, but Lemma 5.1, which in some sense says 



that maintaining an approximate VEB data structure 
is equivalent to maintaining an exact counterpart us- 
ing larger words, suggests that lower bounds on com- 
putations with large words are relevant to approximate 
sorting and data structures. 

Exploiting the power of RAM. Fredman and 
Willard have considered a number of data structures 
taking advantage of arithmetic and bitwise operations 
on words of size 0(\ogU). In pZ] , they presented the 
fusion tree data structure. Briefly, fusion trees im- 
plement the VEB data type in time 0(logn/loglogn). 
They also presented an atomic heap data structure pd| ] 
based on their fusion tree and used it to obtain a linear- 
time minimum spanning tree algorithm and an 0(m + 
n log n I log log n)-time single-source shortest paths algo- 
rithm. Willard |29| also considered similar applications 
to related geometric and searching problems. Generally, 
these works assume a machine model similar to ours and 
demonstrate remarkable theoretical consequences of the 
model. On the other hand, they are more complicated 
and involve larger constants. 

Subsequent to our work Klein and Tarjan recently 
announced a randomized minimum spanning tree algo- 
rithm that requires only expected linear time Jl8[ . Ar- 
guably, our algorithm is simpler and more practical. 

4 Model of computation 

The model of computation assumed in this paper is 
a modernized version of the random access machine 
(ram). Many ram models of a similar nature have 
been defined in the literature, dating back to the 
early 1960s [Q. Our ram model is a realistic variant 
of the logarithmic-cost ram the model assumes 
constant-time exact binary integer arithmetic (+, — , 
x, div), bitwise operations (left-shift, right-shift, 
bitwise-xor, bitwise-and), and addressing operations 
on words of size b. Put another way, the word size 
of the ram is b. We assume that numbers are of 
the form i + j /2 b , where i and j are integers with 
< i , j < 2 b , and that the numbers are represented with 
two words, the first holding i and the second holding 
j. For simplicity of exposition, we use the "most- 
significant-bit" function MSB(ir) = Llog 2 xJ; it can be 
implemented in small constant time via the previously 
mentioned operations and has lower circuit complexity 
than, e.g., division. 
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5 Fast approximate data structures 

This section gives the details of our approximate VEB 
data structure. First we give the relevant semantics 
and notations. The operations supported are: 

N «— Insert(x, d), 

DELETE(iV), 

N <— Search (x), 
N <— Minimum ( ), 
N <— Maximum( ), 
AT <— Predecessor^), 

AT <— SUCCESSOR^), 

d <- Data (AT), and 
x <— Element(A^). 

The Insert operation and the query operations return 
the name N of the element in question. The name is 
just a pointer into the data structure allowing constant- 
time access to the element. Subsequent operations on 
the element are passed this pointer so they can access 
the element in constant time. Insert takes an addi- 
tional parameter d, an arbitrary auxiliary data item. 
Search (x), where x is a real number (but not neces- 
sarily an element), returns the name of the largest el- 
ement less than or equal to x. For the approximate 
variants, the query operations are approximate in that 
the element returned by the query is within a (1 + e) 
relative factor or a A absolute amount of the correct 
value. Operations Element(A^) and Data(AT), given 
an element's name N, return the element and its data 
item, respectively. 

The universe (specified by U) and, for the approx- 
imate variants, the error of approximation (e or A) are 
specified when the data structure is instantiated. 

5.1 Equivalence of various approximations. 

The lemma below assumes a logarithmic word-size RAM. 
The notion of equivalence between data structures is 
that, given one of the data structures, the other can be 
simulated with constant-time overhead per operation. 

Lemma 5.1. The problem of representing a multiplica- 
tively (1 + e)- approximate VEB on universe [i,U] is 
equivalent to the problem of representing an exact VEB 
on universe {0, 1, 0(log 1+c U)}. 

The problem of representing an additively A- 
approximate VEB on universe [0, U] is equivalent to 
the problem of representing an exact VEB on universe 
{0,l,...,O([//A)}. 



Proof. Assume we have a data structure for the exact 
data type on the specified universe. To simulate the 



multiplicatively approximate data structure, the nat- 
ural mapping to apply to the elements (as discussed 
previously) is i n Ll°gi+e x \ ■ Instead, we map x to 
approximately (log 1+c x) w (log 2 x)/e and we use a 
mapping that is faster to compute: Let k = |~log 2 -] , let 
x = i + j/2 b , and let I = MSB(i). We use the mapping 
/ that maps x to 

£ left-shift(fc) 

bitwise-or (i right-shift {£ — k)) 
bitwise- xor (1 left-shift k) 
bitwise-or (j right-shift (b + £ — k)) . 

If £ < k, then to right-shift by (£—k) means to left-shift 
by (k — £) . Note that in this case the fractional part of 
x is shifted in. 

This mapping effectively maps x to the lexicograph- 
ically ordered pair (MSB(x), y), where y represents the 
bits with indices {£ — 1) though {£ — k) in x. The first 
part of the tuple distinguishes between any two x values 
that differ in their most significant bit. If two x values 
have MSB(x) = £, then it suffices to distinguish them 
if they differ additively by 2 e ~ k . The second part of the 
tuple suffices for this. 

Note that f(l) = and f(U) < 2 k+1 log 2 U = 
0(log 1+e U). This shows one direction of the first part. 
The other direction of the first part is easily shown by 
essentially inverting the above mapping, so that distinct 
elements map to elements that differ by at least a factor 
of 1 + e. Finally, the second part follows by taking the 
mapping (x i— > x div A) and its inverse. 



5.2 Implementations. Lemma 5.1 reduces the ap- 



proximate problems to the exact problem with smaller 
universe size. This section gives an appropriate solu- 
tion to the exact problem. If an approximate variant 
is to be implemented, we assume the elements have al- 
ready been mapped by the constant-time function / in 



Lemma 5.1. The model of computation is a RAM with 



6-bit words. 

A dictionary data structure supports update oper- 
ations SET(key, value) and UNSET(fcey) and query op- 
eration LoOK-Up(A;e7/) (returning the value, if any, as- 
sociated with the key). It is well known how to imple- 
ment a dictionary by hashing in space proportional to 
the number of elements in the dictionary or in an array 
of size proportional to the key space. In either case, 
all dictionary operations require only constant time. In 
the former case, the time is constant with high prob- 
ability |P, ||; in the latter case, a well-known trick is 
required to instantiate the dictionary in constant time. 

Each instance of our data structure will have a 
doubly-linked list of element /datum pairs. The list is 



6 



Matias, Vitter, & Young 



ordered by the ordering induced by the elements. The 
name of each element is a pointer to its record in this 
list. 

If the set to be stored is a multiset, as will generally 
be the case in simulating an approximate variant, then 
the elements will be replaced by buckets, which are 
doubly-linked lists holding the multiple occurrences of 
an element. Each occurrence holds a pointer to its 
bucket. In this case the name of each element is a 
pointer to its record within its bucket. 

Each instance will also have a dictionary mapping 
each element in the set to its name. If the set is 
a multiset, it will map each element to its bucket. 
In general, the universe, determined when the data 
structure is instantiated, is of the form {L, U}. Each 
instance records the appropriate L and U values and 
subtracts L from each element, so that the effective 
universe is {0, U — L}. 

The ordered list and the dictionary suffice to 
support constant-time Predecessor, Successor, 
Minimum, and Maximum operations. The other oper- 
ations use the list and dictionary as follows. Insert(i) 
finds the predecessor-to-be of i by calling Search(«), 
inserts i into the list after the predecessor, and updates 
the dictionary. If S is a multiset, % is inserted instead 
into its bucket and the dictionary is updated only if the 
bucket didn't previously exist. Delete(A) deletes the 
element from the list (or from its bucket) and updates 
the dictionary appropriately. 

How Search works depends on the size of the uni- 
verse. The remainder of this section describes Search 
queries and how Insert and Delete maintain the ad- 
ditional structure needed to support Search queries. 

5.3 Bit-vectors. For a universe of size b, the 
additional structure required is a single 6-bit word w. 
As described in Section 



1.1 



the word represents a bit 
vector; the zth bit is 1 iff the dictionary contains an 
element i. Insert sets this bit; Delete unsets it if no 
occurrences of i remain in the set. Setting or unsetting 
bits can be done with a few constant time operations. 

The Search(i) operation is implemented as follows. 
If the list is empty or i is less than the minimum element, 
return nil. Otherwise, let 

j <- MSB(w bitwise-and ((1 left-shift i) - 1)) , 

i.e., let j be the index of the most significant 1-bit in w 
that is at most as significant as the ith bit. Return j's 
name from the dictionary. 

Analysis. On universes of size b, all operations require 
only a few constant-time operations. If hashing is used 



to implement the dictionary, the total space (number 
of words) required at any time is proportional to the 
number of elements currently in the set. 

5.4 Intermediate data structure. The fully re- 
cursive data structure is a straightforward modification 
of the original van Emde Boas data structure. For 
those not familiar with the original data structure, we 
first give an intermediate data structure that is con- 
ceptually simpler as a stepping stone. The additional 
data structures to support Search(z) for a universe 
{0, 1, W — 1} arc as follows. 

Divide the problem into 6+1 subproblems: if the 
current set of elements is S, let Sk denote the set 
{i € S : i div V^ 1 = k}. Inductively maintain a veb 
data structure for each non-empty set Sk- Note that 
the universe size for each Sk is V . Each Sk can be a 
multiset only if S is. 

Let T denote the set {k : Sk not empty }. Induc- 
tively maintain a veb data structure for the set T. The 
datum for each element k is the data structure for Sk- 
Note that the universe size for T is b. Note also that T 
need not support multi-elements. 

Implement Search(j) as follows. If i is in the 
dictionary, return i's name. Otherwise, determine k 
such that i would be in Sk if i were in S. Recursively 
search in T for the largest element kl less than or equal 
to k. If k! < k or i is less than the minimum element 
of Sk , return the maximum element of Sk> ■ Otherwise, 
recursively search for the largest element less than or 
equal to i in Sk and return it. 

Insert and Delete maintain the additional data 
structures as follows. Insert(i) inserts i recursively 
into the appropriate Sk- If Sk was previously empty, it 
creates the data structure for Sk and recursively inserts 
k into T. Delete(A^) recursively deletes the element 
from the appropriate Sk • If Sk becomes empty, it deletes 
k from T. 

Analysis. Because the universe of the set T is of 
size b, all operations maintaining T take constant time. 
Thus, each Search, Insert, and Delete for a set 
with universe of size U = b 3 requires a few constant- 
time operations and possibly one recursive call on a 
universe of size V . Thus, each such operation requires 
O(j) = 0(\og b U) time. 

To analyze the space requirement, note that the size 
of the data structure depends only on the elements in 
the current set. Assuming hashing is used to implement 
the dictionaries, the space required is proportional to 
the number of elements in the current set plus the space 
that would have been required if the distinct elements 
of the current set had simply been inserted into the 
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data structure. The latter space would be at worst 
proportional to the time taken for the insertions. Thus, 
the total space required is proportional to the number 
of elements plus 0(log b U) times the number of distinct 
elements. 

5.5 Full recursion. We exponentially decrease the 
above time by balancing the subdivision of the problem 
exactly as is done in the original van Emde Boas data 
structure. 

The first modification is to balance the universe 
sizes of the set T and the sets {Sk}- Assume the 
universe size is b 2 ' . Note that b 23 = b 2,3 x b 2,3 
Define Sk = {i € S : i div b 2 ' = k} and define 
T = {k : Sk is not empty}. Note that the universe size 
of each Sk and of T is b 2,3 1 . 

With this modification, Search, Insert, and 
Delete are still well defined. Inspection of Search 
shows that if Search finds k in T, it does so in con- 
stant time, and otherwise it does not search recursively 
in Sk- Thus, only one non-constant-time recursion is 
required, into a universe of size b 2,3 . Thus, Search 
requires O(j) time. 

Insert and Delete, however, do not quite have 
this nice property. In the event that Sk was previously 
empty, Insert descends recursively into both Sk and T. 
Similarly, when Sk becomes empty, Delete descends 
recursively into both Sk and T. 

The following modification to the data structure 
fixes this problem, just as in the original van Emde Boas 
data structure. Note that Insert only updates T when 
an element is inserted into an empty Sk. Similarly, 
Delete only updates T when the last element is deleted 
from the set Sk- Modify the data structure (and all 
recursive data structures) so that the recursive data 
structures exist only when |5| > 2. When \S\ = 1, the 
single element is simply held in the list. Thus, insertion 
into an empty set and deletion from a set of one element 
require only constant time. This insures that if Insert 
or Delete spends more than constant time in T, it will 
require only constant time in Sk- 

This modification requires that when S has one ele- 
ment and a new element is inserted, Insert instantiates 
the recursive data structures and inserts both elements 
appropriately. The first clement inserted will bring both 
T and some Sk to size one; this requires constant time. 
If the second element is inserted into the same set Sk as 
the first element, T is unchanged. Otherwise, the inser- 
tion into its newly created set Sk> requires only constant 
time. In either case, only one non-constant-time recur- 
sion is required. 

Similarly, when S has two elements and one of 



them is deleted, after the appropriate recursive dele- 
tions, Delete destroys the recursive data structures 
and leaves the data structure holding just the single re- 
maining element. If the two elements were in the same 
set Sk, then T was already of size one, so only the dele- 
tion from Sk requires more than constant time. Oth- 
erwise, each set Sk and Sk' was already of size one, so 
only the deletion of the second element from T took 
more than constant time. 

Analysis. With the two modifications, each Search, 
Insert, and Delete for a universe of size U = b 2 ' 
requires at most one non-constant-time recursive call, 
on a set with universe size b 2 . Thus, the time required 
for each operation is 0(j) — 0(loglog b U). As for the 
intermediate data structure, the total space is at worst 
proportional to the number of elements, plus the time 
per operation (now 0(loglog b U)) times the number of 
distinct elements. 

6 Conclusions 

The approximate data structures described in this pa- 
per are simple and efficient. No large constants are hid- 
den in the asymptotic notations — in fact, a "back of the 
envelope" calculation indicates significant speed-up in 
comparison to the standard van Emde Boas data struc- 
ture. The degree of speed-up in practice will depend 
upon the machines on which they are implemented. 
Machines on which binary arithmetic and bitwise op- 
erations on words are nearly as fast as, say, compari- 
son between two words will obtain the most speed-up. 
Practically, our results encourage the development of 
machines which support fast binary arithmetic and bit- 
wise operations on large words. Theoretically, our re- 
sults suggest the need for a model of computation that 
more accurately measures the cost of operations that 
are considered to require constant time in traditional 
models. 

The applicability of approximate data structures to 
specific algorithms depends on the robustness of such 
algorithms to inaccurate intermediate computations. 
In this sense, the use of approximate data structures 
has an effect similar to computational errors that arise 
from the use of finite precision arithmetic. In recent 
years there has been an increasing interest in studying 
the effect of such errors on algorithms. Of particular 
interest were algorithms in computational geometry. 
Frameworks such as the "epsilon geometry" of Guibas, 
Salesin and Stolfi @ may be therefore relevant in our 
context. The "robust algorithms" described by Fortune 
and Milenkovic ||, ^, [2l], [2^, |23) are natural candidates 
for approximate data structures. 

Expanding the range of applications of approximate 
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data structures is a fruitful area for further research. 
Other possible candidates include algorithms in com- 
putational geometry that use the well-known sweeping 
technique, provided that they are appropriately robust. 
For instance, in the sweeping algorithm for the line ar- 
rangement problem with approximate arithmetic, pre- 
sented by Fortune and Milenkovic |9| , the priority queue 
can be replaced by an approximate priority queue with 
minor adjustments, to obtain an output with similar 
accuracy. If the sweeping algorithm of Chew and For- 
tune can be shown to be appropriately robust then 
the use of the van Emde Boas priority queue there can 
be replaced by an approximate variant; an improved 
running time may imply better performance for algo- 
rithms described in || . 
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